Data Science Problem

Data visualization is an essential tool that is used to communicate ideas effectively (Antony Unwin, 2020). When dealing with a large data set, more often than not, main ideas are better encapsulated in a useful and understandable visualization (Mathieu Stark, 2020).


Introduction

There are plenty of visualization packages out there for R. The top R data visualizations in 2020 includes plotly (Harkiran Kaur, 2020), and so, for this vignette, I will look at the package plotly to showcase a few ways to visualize interactive web plots. It should be noted that there are a variety of functions within plotly (which can be found here should you wish to explore them).


Resources

Install Packages

If the packages mentioned below are already installed, then there is no need to go through this step, so feel free to skip this portion. The packages below have been deemed as necessary to run through this R Vignette.

# if you require the below packages to be installed, remove the '#' at the front
# install.packages("tidyverse")
# install.packages("plotly")
# install.packages("magrittr")

It is best practice to ensure that a package has been installed prior to loading so that you do not incur any errors.

Load Packages

This section will show you how to load the packages required to run through this R Vignette. To load the packages, use the function library().

library(plotly)
library(tidyverse)
library(magrittr)

Note: The package tidyverse was loaded as this package essentially contains a wide variety of r packages one would use for analytical purposes (including, but not limited to, data manipulation and tidy data) (Wickham et al., 2019).
Note: The package magrittr was also loaded as this package allows us to utilise pipes and other operators useful when coding in R (Stefan Milton Bache, 2014).

Data

As a ramen enthusiast, I have picked the CSV file from this kaggle dataset. I downloaded the CSV file and saved it in a folder that I called “data.”


Plotly - A closer look

Section 1: Load the data

In order for us to have a closer look at the plotly package, we would first need to load some data to analyse. To load the ramen data I downloaded (Aleksey Bilogur, 2018), I used the read_csv function from the package tidyverse, and called it ramen.

ramen <- read_csv("data/ramen-ratings.csv")
## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   `Review #` = col_double(),
##   Brand = col_character(),
##   Variety = col_character(),
##   Style = col_character(),
##   Country = col_character(),
##   Stars = col_character(),
##   `Top Ten` = col_character()
## )
head(ramen, n = 5) # shows the first five rows of each column in this tibble
## # A tibble: 5 × 7
##   `Review #` Brand          Variety                Style Country Stars `Top Ten`
##        <dbl> <chr>          <chr>                  <chr> <chr>   <chr> <chr>    
## 1       2580 New Touch      T's Restaurant Tantan… Cup   Japan   3.75  <NA>     
## 2       2579 Just Way       Noodles Spicy Hot Ses… Pack  Taiwan  1     <NA>     
## 3       2578 Nissin         Cup Noodles Chicken V… Cup   USA     2.25  <NA>     
## 4       2577 Wei Lih        GGE Ramen Snack Tomat… Pack  Taiwan  2.75  <NA>     
## 5       2576 Ching's Secret Singapore Curry        Pack  India   3.75  <NA>

Section 2: Data Preparation

Before we can use this data for any type of plotting, some preparations must be adhered to. One way to prep your data is to have it adhere to the tidy data way.

"In tidy data:

  1. Every column is a variable.

  2. Every row is an observation.

  3. Every cell is a single value."

(Cran - Tidy Data)

For the plotly plot I have in mind, I require Stars to be numeric and, based on the head() function performed in Section 1, I will only need some of the columns. I will also opt to remove any row with NA.

ramen$Stars <- as.numeric(ramen$Stars) 
ramen_sub <- select(ramen, c(Brand,Style,Country,Stars)) %>%
  drop_na()

A few more simple checks to see if the data is ready for plotting:

head(ramen_sub) # shows the first few rows of each column in this tibble
## # A tibble: 6 × 4
##   Brand          Style Country     Stars
##   <chr>          <chr> <chr>       <dbl>
## 1 New Touch      Cup   Japan        3.75
## 2 Just Way       Pack  Taiwan       1   
## 3 Nissin         Cup   USA          2.25
## 4 Wei Lih        Pack  Taiwan       2.75
## 5 Ching's Secret Pack  India        3.75
## 6 Samyang Foods  Pack  South Korea  4.75
str(ramen_sub) # shows the internal structure of this tibble 
## tibble [2,575 × 4] (S3: tbl_df/tbl/data.frame)
##  $ Brand  : chr [1:2575] "New Touch" "Just Way" "Nissin" "Wei Lih" ...
##  $ Style  : chr [1:2575] "Cup" "Pack" "Cup" "Pack" ...
##  $ Country: chr [1:2575] "Japan" "Taiwan" "USA" "Taiwan" ...
##  $ Stars  : num [1:2575] 3.75 1 2.25 2.75 3.75 4.75 4 3.75 0.25 2.5 ...
summary(ramen_sub) # shows some descriptive statistics of your tibble
##     Brand              Style             Country              Stars      
##  Length:2575        Length:2575        Length:2575        Min.   :0.000  
##  Class :character   Class :character   Class :character   1st Qu.:3.250  
##  Mode  :character   Mode  :character   Mode  :character   Median :3.750  
##                                                           Mean   :3.655  
##                                                           3rd Qu.:4.250  
##                                                           Max.   :5.000

For the sake of this example, I will use the mean of Stars to 2 decimal points.

final_ramen <- ramen_sub  %>% 
  group_by(Country,Style) %>%
  summarise(Stars = round(mean(Stars),digits = 2))

Section 3: Finally! Let’s Plot

For this example, I have chosen a bubble plot to visualize my ramen data. (A bubble plot is essentially a scatter plot where the size and colour of the “bubble” can be manipulated)

The fun element about plotly visuals is the ability to interact with them (Plotly r Open Source Graphing Library). Try hovering over the plot or zoom in/out.

fig1 <- plot_ly(final_ramen, x = ~Country, y = ~Style, color = ~Stars, 
                size = ~Stars, text = ~Stars, type = 'scatter', mode = 'markers', 
                hovertemplate = paste(
                  'Stars: %{text}',
                  '<br>%{x}<extra></extra>'
                ),
                marker = list(
                  opacity = 0.5, colors = 'Viridis'
                  )
                )
fig1 <- fig1 %>% layout(title = 'Ramen Ratings by Country of Origin and Style', 
                        xaxis = list(showgrid = FALSE), 
                        yaxis = list(showgrid = FALSE)
                        )

fig1
  • type = 'scatter' and mode = 'markers' are crucial portions of the code to provide that “bubble” look.
  • The marker = list(*) portion of code is customizable depending on how you would like your “bubble” to look.
  • Using showgrid = FALSE in your code makes it so that the grid will not show in your plot.

Section 4: Another Plot

One of my favourite plotly plots is one called Choropleth (a map-based interactive plot).

fig2 <- plot_ly(final_ramen, # data to plot
                type = 'choropleth', # specifies a choropleth plot
                locations = ~Country, # column that contains locations
                locationmode = 'country names', # locations label type 
                z = ~Stars, zmin = 1, zmax = 5, # this is what will be featured in the colorbar
                colorscale = 'Viridis', # the color I chose for the colorbar
                marker = list(
                  line = list(
                    color = "grey", width = 0.5 # line details separating the locations
                  )
                )
                )
fig2 <- fig2 %>% layout(title = 'Avg Ramen Ratings by Country of Origin'
                        )
fig2 <- fig2 %>% colorbar(title = "Avg Star Rating",
                          position = "bottomright"
                        )

fig2

The coding for this has a slightly steep learning curve and more examples can be found here.


Conclusion

As you can see, interactive web graphs provide a fun and informative way to visualize your data. What you see on this vignette is but a sampling of what’s on offer.


References

Aleksey Bilogur. (2018). Ramen ratings. https://www.kaggle.com/residentmario/ramen-ratings
Antony Unwin. (2020). Why is data visualization important? What is important in data visualization? University of Augsburg. https://hdsr.mitpress.mit.edu/pub/zok97i7p/release/3
Cran - tidy data. https://cran.r-project.org/web/packages/tidyr/vignettes/tidy-data.html
Harkiran Kaur. (2020). Top r libraries for data visualization in 2020. Geeks for Geeks. https://www.geeksforgeeks.org/top-r-libraries-for-data-visualization-in-2020/
Mathieu Stark. (2020). Why data visualization is important. Analytiks. https://analytiks.co/importance-of-data-visualization/
Plotly r open source graphing library. Plotly Technologies Inc. https://plotly.com/r/
Stefan Milton Bache. (2014). Simpler r coding with pipes > the present and future of the magrittr package. https://www.r-statistics.com/tag/stefan-milton-bache/
Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L. D., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L., Miller, E., Bache, S. M., Müller, K., Ooms, J., Robinson, D., Seidel, D. P., Spinu, V., … Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686